Query-Adaptive R-CNN for Open-Vocabulary Object Detection and Retrieval

نویسندگان

  • Ryota Hinami
  • Shin'ichi Satoh
چکیده

We address the problem of open-vocabulary object retrieval and localization, which is to retrieve and localize objects from a very large-scale image database immediately by a textual query (e.g., a word or phrase). We first propose Query-Adaptive R-CNN, a simple yet strong framework for open-vocabulary object detection. Query-Adaptive RCNN is a simple extension of Faster R-CNN from closedvocabulary to open-vocabulary object detection: instead of learning a class-specific classifier and regressor, we learn a detector generator that transforms a text into classifier and regressor weights. All of its components can be learned in an end-to-end manner. Even with its simple architecture, it outperforms all state-of-the-art methods in the Flickr30k Entities phrase localization task. In addition, we propose negative phrase augmentation, a generic approach for exploiting hard negatives in the training of open-vocabulary object detection that significantly improves the discriminative ability of the generated classifier. We show that our system can retrieve and localize objects specified by a textual query from one million images in only 0.5 seconds.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Modified Grasshopper Optimization Algorithm Combined with CNN for Content Based Image Retrieval

Nowadays, with huge progress in digital imaging, new image processing methods are needed to manage digital images stored on disks. Image retrieval has been one of the most challengeable fields in digital image processing which means searching in a big database in order to represent similar images to the query image. Although many efficient researches have been performed for this topic so far, t...

متن کامل

An Affect-Based Video Retrieval System with Open Vocabulary Querying

Content-based video retrieval systems (CBVR) are creating new search and browse capabilities using metadata describing significant features of the data. An often overlooked aspect of human interpretation of multimedia data is the affective dimension. Incorporating affective information into multimedia metadata can potentially enable search using this alternative interpretation of multimedia con...

متن کامل

Context Aware Query Image Representation for Particular Object Retrieval

The current models of image representation based on Convolutional Neural Networks (CNN) have shown tremendous performance in image retrieval. Such models are inspired by the information flow along the visual pathway in the human visual cortex. We propose that in the field of particular object retrieval, the process of extracting CNN representations from query images with a given region of inter...

متن کامل

Deep Multiple Instance Hashing for Object-based Image Retrieval

Multi-keyword query is widely supported in text search engines. However, an analogue in image retrieval systems, multi-object query, is rarely studied. Meanwhile, traditional object-based image retrieval methods often involve multiple steps separately. In this work, we propose a weakly-supervised Deep Multiple Instance Hashing (DMIH) framework for object-based image retrieval. DMIH integrates o...

متن کامل

Object Detection in Video using Faster R-CNN

Convolutional neural networks (CNN) currently dominate the computer vision landscape. Recently, a CNN based model, Faster R-CNN [1], achieved stateof-the-art performance at object detection on the PASCAL VOC 2007 and 2012 datasets. It combines region proposal generation with object detection on a single frame in less than 200ms. We apply the Faster R-CNN model to video clips from the ImageNet 2...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1711.09509  شماره 

صفحات  -

تاریخ انتشار 2017